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ABSTRACT 

We  consider  the  prcij.em  of  maximizing  the  long-run  average  return 
in  a  single  server  queueing  reward  system  in  which  the  customer's  offer 
of  a  Joint  distribution  of  reward  and  service  time  required  to  earn  this 
reward  is  independent  of  the  renewal  process  which  governs  customer 
arrivals.  After  formulating  the  problem  as  a  semi -Markov  decision 
process,  we  characterize  the  form  of  an  optimal  policy.  When  the 
renewal  process  is  Poisson,  the  characterization  is  easily  stated: 
accept  a  customer  if  and  only  if  the  ratio  of  h.is  expected  reward  to 
his  expected  service  time  is  larger  than  g,  the  long-run  average 
return.  When  the  arrival  process  is  Poisson,  g  is  easily  found. 

Next,  batch  arrivals  are  permitted,  and  further  results  are  obtained. 
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1.  Introduction 

vie  consider  the  problem  of  maximizing  the  long-run  average  return 
in  a  single  server  queueing  regard  system  in  which  the  customer's  offer 
of  a  joint  distribution  of  reward  and  service  time  required  to  earn  this 
reward  is  independent  of  the  renewal  process  which  governs  customer 
arrivals.  In  describing  the  model,  ve  find  it  enlightening  to  intro¬ 
duce  the  necessary  notation  and  terminology  in  the  context  of  a  problem 
which  us  refer  to  as  "the  streetwalker's  dilemma." 

Consider  a  streetwalker  ’.forking  in  a  large  city,  and  suppose  that 
her  customers  arrive  according  to  a  renewal  process  having  interarrival 
distribution  F  with  F(o)  =  0.  Each  arriving  customer  makes  an  offer 
which  she  must  either  accept  or  reject,  and  all  customers  who  arrive 
while  she  is  busy  or  whose  offer  she  has  rejected  are  assumed  lost.  Thus 
pre-emption  and  backlogging  are  not  permitted.  If  she  accepts  a  customer 
(i.e.,  an  offer)  of  type  x,  -^  <  x  <  «,  then  the  probability  that  she 
will  receive  no  more  than  s  dollars  and  that  the  service  time  required 
to  earn  this  reward  will  not  exceed  t  is  given  by  the  joint  distribu¬ 
tion  Gx(s,t).  Furthermore,  the  distribution  function  H  of  the  type 
of  offer  she  receives  is  independent  of  the  renewal  process  and  of  her 
past  decisions,  and  hence  successive  offers  are  independent  and  identi¬ 
cally  distributed.  The  streetwalker's  dilemma,  then,  is  to  decide  which 
customers  to  accept  and  which  customers  to  reject  so  as  to  maximize  her 
long-run  average  return. 

The  model  can  be  viewed  as  one  for  determining  whether  or  r.ot  a 
factory  or  Job  shop  should  accept  potential  jobs.  Several  other  interesting 
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examples  of  this  model  are  given  by  Miller  [2,  pp.  67-70].  The  funda¬ 
mental  difference  between  our  model  and  Miller's  [2]  is  that  his  is 
restricted  to  (i)  exponential  service  time  which  is  assumed  independent 
of  the  customer  type,  (ii)  Poisson  arrivals,  and  (iii)  a  finite  number  of 
customer  types.  On  the  other  hand.  Miller  has  the  added  generality  of 
allowing  marv  servers. 

In  section  2  we  formulate  the  problem  as  a  semi-Markov  decision 
process  and  introduce  the  necessary  notation.  Employing  recent  results 
due  to  Ross  [41,  we  determine  the  structure  of  an  optimal  policy  in 
section  3.  Next,  we  specialize  xo  the  case  of  Poisson  arrivals  and  prove 
a  monotonic  property  which  enables  us  to  easily  calculate,  in  practice, 
the  optimal  policy.  Finally,  we  allow  batch  arrivals,  and  again  deter¬ 
mine  the  structure  of  an  optimal  policy. 

2.  Notation  and  Definitions 

In  characterizing  the  structure  of  an  optimal  policy,  it  behooves 
us  to  formulate  our  model  as  a  semi-Markov  decision  process. 

Definition.  A  semi -Markov  decision  process  is  a  process  with  state  space 

S  and  action  space  A.  Whenever  a  transition  to  some  state  occurs,  an 

action  is  chosen.  If  the  state  is  x  and  action  a  is  chosen,  then 

(i)  the  next  state  of  the  system  is  given  by  the  distribution 

function  P  (•), 
x,av 

(ii)  conditional  on  the  event  that  the  next  state  is  y,  the  time 
until  the  transition  from  x  to  y  occurs  is  a  random  vari¬ 
able  with  distribution  function  F  (•), 

x,y,aN 

and 
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(iii)  there  is  a  reward  earned  at  the  time  the  action  is  taken,  and 

it  is  a  random  variable  depending  on  x  and  a. 

When  the  transition  times  are  identically  one,  that  is,  when 

F  (t)  equals  0  for  t  <  1  and  1  for  t  &  1,  then  we  have 
x,y,a 

the  more  familiar  Markov  decision  process. 

We  shall  say  that  an  offer  is  made  only  when  a  customer  arrives  and 
finds  the  streetwalker  free .  Then  in  our  model,  the  process  is  said  to 
he  in  state  x  at  time  t  if  the  last  offer  made  at  or  prior  to  time 
t  was  an  offer  of  type  x  so  S  =  ( ,  We  say  that  a  transition 
to  state  y  occurs  at  time  t  if  an  offer  of  type  y  is  made  at  time 
t.  In  each  state  there  are  two  possible  actions:  accept  (action  l) 
and  reject  (action  2).  Thus,  A  =  ?1,2},  The  reward  earned  for  reject¬ 
ing  an  offer  is  zero  while  the  reward  earned  for  accepting  an  offer  of 
type  x  has  distribution  function  R  ( • )  given  by 

00 

\(s)  =  |dGx(s,t)  . 

0 

The  transition  function  P  is  independent  of  x  and  a  and  equals 

X  y  s. 

H,  whereas  the  distribution  F  of  time  until  the  next  transition 

x,y,a 

occurs  equals  F  if  a  =  2.  We  give  F  later  for  the  case  a  =  1, 

x,y,a 

A  policy  v  is  any  (possibly  randomized)  rule  which  for  each  t  r  0 
specifies  which  action  to  take  at  time  t  given  the  current  state  and 
the  past  decisions  and  history  of  the  process .  Of  particular  interest 
are  (nonrar.domized)  stationary  policies  which,  independently  of  the  time 
t  and  the  past  decisions  and  history  of  the  process,  simply  specify 
which  action  to  take  from  each  state.  In  our  model,  a  stationary  pclicy 


separates  the  type s  of  offers  into  tvo  categories:  those  we  always 
accept  and  those  ve  always  reject. 

For  each  policy  n  and  each  state  x,  ve  define 


(1) 


cp^x)  =  lim  inf  E  = 

TT  .  Tr  _  X  -L 

t  -•  « 


and 


(2) 


EJ  F  ZJX  = 


to  (x)  =  lim  inf 

'  TT 

n  -*  « 


j-i 


J  i  T  !x  =  x] 

nU  i  j  1  J 


where  Z(t ),  Z.,  and  r.  denote,  respectively,  the  total  rewards  received 
J  J 


by  time  t,  the  reward  received  during  the 

.th 


,th 


transition  interval, 


and  the  length  of  the  j 
state . 


transition  interval;  X1  is  the  initial 


The  criterion  given  in  (l)  is  the  usual  definition  of  the  long-run 
average  return.  The  criterion  given  in  (2)  was  first  suggested  by 
Ross  and  is  the  limit  of  the  ratio  of  the  expected  reward  earned 
during  the  first  n  transitions  to  the  expected  time  for  the  first  n 
transitions.  Even  though  (l)  is  slightly  more  appealing  tnan  (2),  ve 
shall  adopt  (2)  as  our  definition  of  the  long-run  average  return  as  it 
is  more  amenable  to  analysis.  We  shall  show,  however,  that  the  two 
criteria  are  equivalent  for  stationary  policies. 

The  problem,  then,  is  to  find  a  policy  -r*,  termed  optimal,  such 


that 
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Finally,  let  R(x,l)  and  R(x,2)  denote  the  expected  reward 
received  during  a  transition  interval  which  begins  with  her  acceptance 
or  rejection  of  an  offer  of  type  x,  respectively.  Also,  denote  by 
t(x, l)  and  t(x,2)  the  expected  length  of  a  transition  interval  which 
begins  with  her  acceptance  or  rejection  of  an  offer  of  type  x,  res¬ 
pectively.  Then 

m  co 

R(x,l)  =  r  f  sdGx(s,t)  , 

0  -® 

R(x,2)  =  0  , 


and 


t(x,1)  =  f  r  (t  +  EY.  )  &Gje,t)  , 

0  -CO 


v(x,2)  =  [  ydF(y)  n  g.  , 
0 


where  EY^  is  the  expected  amount  of  time  that  she  must  wait  (remain 
idle)  until  she  receives  another  offer  given  that  she  spent  an  amount 
of  time  t  with  her  previous  customer.  (Y^  is  just  the  excess  life 
at  time  t  of  the  renewal  process  [3,  p.  1731.)  Note  too  that  t(x,1)  "■  g, 
for  all  x. 


3.  Optimal  Policies 

Of  considerable  importance  is  the  fact  [41  that  we  can  assume  with¬ 
out  loss  of  generality  that  whenever  action  a  is  taken  in  state  x, 
then  the  length  of  each  transition  interval  is  identically  7(x,a)  and 
the  reward  earned  is  identically  R(x,a).  Using  this  fact,  it  follows 
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that  V  {x,rO,  the  maximal  expected  o-discounted  reward  earned  during 

Of 

the  first  n  transitions  vhen  X^  =  x,  is  given  "by  (V^(x,0)  k  0) 

(3)  VQ,(x,n)  =  max  {r(x,1)  +  e‘aT^k,;L^  f  V^jon-lJdHCy)  ; 

—  CO 

e'“!’  J  Va(y,n-l)dH(y)|  . 

—  CO 

Note  that  V^(x,n)  is  increasing  in  n  for  each  x. 

Throughout  the  remainder  of  the  paper,  we  shall  assume  that  the 
following  condition  holds: 

Condition  1:  There  is  an  M  <  ~  such  that  1 R(x, l) !  <  M  for  all  x  C-  S. 

Lemma  1.  The  limit  function  V  (x)  m  lim  Vq,(x, n)  exists,  is  hounded  in 

n_<“ 

x,  and  satisfies  the  functional  equation 

(1)  7ff(x)  =  max  (r(x,1)  +  e~aT(x>1')  [  Va(y)dH(y);  e-011  f  Va(y)dH(y)|  . 

Proof.  Assume  that  V^(x,n-l)  S"M(l-e  CTl'")/(l-e  af,i).  Then  since 
Va(x,n-l)  £  0,  t(x,1)  ;j.  for  each  x,  and  R(x,l)  s  M,  ve  have 

CO 

Vjx,n)  <:  max  [R(x,l),0l  +  e'0|J'  J  Vo(y,n-l)dH(y) 

—  CO 

<  M  +  e"01-'  sup  VK(y,n-l) 

y 

g  M  +  e~a^‘  M(l-e-anii)/(l-e~^) 

«  M(l-e"Qf^n+1^'Jl)/(l-e-0*J')  . 
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Thus,  it  follows  that  VQ/(x,n)  is  uniformly  bounded  in  x  and  n. 
Therefore,  we  can  conclude  that  the  limit  exists  and  is  bounded  in  x 
since  V  (x,n)  is  increasing  in  n  for  each  x.  The  desired  result 
now  follows  by  applying  the  Lobes quo  dominated  convergence  theorem  to 

(3). 

Q  *E  «D  • 

Lemma  2.  For  each  pair  x, z  in  S  and  all  a  >  0,  ]Va(x)-V  (z)|  fi  M. 
Proof »  Fix  x,z  '  S  and  >  0,  recall  that  t(x,i)  ^  u,  and  note 

to  co 

that  P  VQ,(y)dH(y)  >  0.  By  Lemma  1,  V^(z)  ^  e~C,“  J*  V  (y)dK(y)  so 

—CO  _co 

again  by  Lemma  1  we  have  either 

V'x)  =  R(x,l)  +  e"QfT(::>1)  f  Va(y)dH(y) 

—  CO 

.<  M  +  e"0fT'X,‘L^  r  V^y)dK(y) 

-no 

<  M  +  z )  , 

or 

V»  -  ***  JVy)d:,(y)  *  Vz •'  * 


In  either  case,  v;  have  V  (x)  -  Y^(z)  s  M  so  the  desired  result  is 
obtained  by  reversing  the  roles  of  x  and  z. 

Q.E.I). 

Theorem  3«  It  ia  optimal  to  accept  an  offer  of  type  x  if  and  only  if 


t(x,  1)  -  u  g  ‘ 


where  g  is  the  optimal  long-run  average  reward,  1 » a . ,  g  =  sup  r-  (x) 

TT 


for  all  x 
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Proof.  It  follows  from  Lemma  2  and  Theorem  3  of  reference  b  that,  there 
is  a  hounded  function  h  and  a  constant  g  such  that 


(5)  h(x)  =  max  {r(x,i1  +  fh(y)dH(y) 

-CO 


Ct(x,1);  rh(y)c,H(y)  -  g|i>,  for  all  x 


Finally,  Theorem  2  of  reference  4  states  that  if  there  is  a  hounded  func¬ 
tion  h  and  a  constant  g  which  satisfies  (5),  then  there  is  a  station¬ 
ary  policy  tt*  such  that 

2  2 

g  =  «  (x)  =  max  co  (x'  for  all  x  : 

Tpr  *  T| 

T7 


and  for  each  x,  pr:scri*bee  an  action  which  maximizes  the  right  side 
of  (5). 

W.F.f . 

It  can  he  shown  [h,  p.  51  that  if  the  expected  length  of  a  transi- 

1  2 

tion  interval  is  finite  for  a  stationary  policy  n,  then  co^  s  co^, 

Moreover,  in  view  of  condition  1,  it  is  easy  to  show  hy  a  simple  renewal 

reward  argument  (see  I’ll)  that  if  the  expected  length  of  a  transition 

1  2 

interval  is  infinite  for  a  stationary  policy  n,  then  cp_  "  c?^  =  0. 

This  establishes  Theorem  b. 

1  2 

Theorem  4.  For  each  stationary  policy  tt,  co  a  cp  .  Hence,  a  best 

- — - —  •  *p  TT 

stationary  policy  in  the  sense  of  (l)  is  given  hy  Theorem  3» 


Poisson  Arrivals 

Of  particular  interest,  ir  che  special  case  wherein  the  renewal 
process  of  arrivals  is  r  Foisson  process  with  rate  Here,  =  1/1, 

and  it  follows  from  the  meiuoryless  property  of  Poisson  processes  that 


9. 

EY^  -  l/\  for  each  t.  Hence, 

t(x,1)  =  tx  +  1/X  > 

where 

03  CO 

■tx  =  J  f  tdGx(s,t) 

0 


is  the  mean  time  that  the  streetwalker  spends  with  a  customer  of  type  x. 
Theorem  3  now  simplifies  and  takes  on  a  more  intuitive  form:  it  is  opti¬ 
mal  to  accept  an  offer  of  type  x  if  and  only  if  l?(x, l)/t  s  8t  that 
is,  if  and  only  if  the  ratio  of  the  mean  reward  to  the  mean  service  ^irce 
is  at  least  as  large  as  the  long-run  ever c~e  rev- rJ! . 

Although  we  have  determined  the  structure  c.C  an  optimal  policy,  it 
remains  to  determine  g.  We  now  establish  a  monotonic  property  which, 
in  practice,  enables  us  to  easily  calculate  g. 


Theorem  t »  Suppose  the  arrival  (renewal)  process  is  Poisson,  and  let 
p(c)  he  the  long-run  average  reward  when  an  offer  of  type  x  is  accepted 
if  and  only  if  R(x,l)/tx  ^  c.  Then  g(»)  is  unimodal. 

Proof.  Let  R  =  rx  J  R(x,l)/tx  r  cl,  R'  =  rx  :  E(x,l)/tx  ^  c ' and 

p  -  fdH(x).  Using  abbreviated  notation,  we  have  (see  rll) 

R 


It 


(i-p)  •  o  +  pf  £dH  Jr 

R  P  R 

“  ~  '  • 
(l-p)n  +  p(p.  +  J  idH)  If  +  ft 

R  p  R 


Fix  c’  >  c. 


Upon  considerable  rearranging  of  terms. 


we  obtain-^ 


assume 


that  fdll  >  0 
IMP 


otherwise  it  is  obvious  that  g(c)  =  g(c'). 
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v 

sign  [g(c'  )  -  g(c)]  =  sign  f g( c '  )  -  ^ 


Also,  the  definition  of  R  and  R1  yields 


T>~R* 

c  <:  — - <  c'  . 


If  c  £  g,  then  (7)  and  (8)  yield 


e(c )  -  if  g(c'  )  -  c  g  -  c  :  0  , 

I  t 

HvR' 

so  g( • )  is  nonincreasing  on  [g,<*>). 

To  show  g  is  nondeoreasing  on  (-», g],  it  suffices  by  (7)  and 
(8)  to  snow  that  g(c)  a  c  for  all  c  s  g.  By  definition  of  g(c), 


rH  +  !tlrg(c)  -  cl  =  J(R  -  ct)  -  c|i 
R  R 

=  ^  (R  -  ct)  +  |R  -  c(h  +  ft) 

RR '  R"  R" 

=  J  (R  -  ct)  +  J(1  -  |)  I  &  0  , 

R~R”  R"  C 


where  R"  =  fx  :  R(x,l)/tx  s  g"i .  The  last  equality  follows  from  (6). 

Q  #E  *D* 


Botch  Arrivals 


11. 


Suppose  that  customers  arrive  in  batches.  In  particular,  suppose 

that  (i)  each  arrival  consists  of  n  customers  with  probability  p 
II  n 

and  S  p  =1,  (ii)  the  batch  size  and  the  offers  are  independent  of 
n=l  n 

the  renewal  process,  and  (iii)  the  conditional  distribution  of  offers 

given  that  there  are  n  customers  in  the  batch  is  given  by  H  ,  so 

n 

H  )  =  Pfi^1  offer  is  r  x . ,  i  =  1,2,  ...,n!batch  size  is  n't  . 

n  1  n  i  ’  7 


As  before,  the  streetwalker  can  accept  at  most  one  customer  from  each 

batch,  and  all  rejected  offers  are  lost.  Transitions  are  defined  as 

before,  ar.d  we  say  that  the  system  is  in  state  '.t  , . . , -x  )  c  ^  if  the 

last  offer  made  was  the  batch  x,  ,...,x  of  offers. 

1  r. 

Recall  that  in  the  special  case  of  Poisson  arrivals  it  is  optimal 
to  accept  an  offer  of  type  x  if  and  only  if  p  a  g  where  o  =  R(x,l)/t  . 

X  XX 

Consequently,  this  leads  one  to  the  conjecture  that  with  batch  and 

Poisson  arrivals,  the  streetwalker  accepts  that  offer--if  any--  x^# 

for  which  p  =  maxro  :  i  =  1,2,. ..,n1.  This  conjecture  is,  however, 
xi*  x± 

false,  for  it  turns  out  that  the  relevant  quantity  is  R(x,l)  -  t^g 
rather  than  ox»  Thus,  offers  cannot  be  ranked  according  to  even 

though  p  provides  us  with  a  eimple  acceptance-rejection  criterion. 


Theorem  6.  When  batch  arrivals  are  permitted,  it  is  optimal  to  reject 
all  offers  from  the  batch  x^, ...,xn  if  and  only  if 


h(x1,l) 
TvxjL,l}  -  n 


g  for  i  =  1,2, . . .,n  . 


< 
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If  an  offer  from  the  batch  x^, ...,xn  is  accepted,  then  It  1b  optimal 
to  accept  that  offer  vith  the  largest  value  of 

R(x1,l)  -  gT(xi,l)  . 

Proof.  The  arguments  used  in  establishing  Lemmas  1  and  2  suffice  to 
establish  their  analogues  for  the  case  of  batch  arrivals.  Hence,  there 
is  a  bounded  function  h  and  a  constant  g  such  that  for  all 
(x  , ...,xn)  P  £  we  have 

r  r-  N 

(S)  h(x1,  ...,x  )  =  maxi  max  |  H(x1,l)  +  T  pv  PhCy^  . .  . .  .,yQ) 

i=l, 2,  0 • • y n  n= 1 

N  . 

-  «t(x±,1) J  ;  ^  pnjh(y1,...,yn)fii:n(y1,...,yn)  -  gpj-  . 

n=l 

Tne  desired  results  now  follow  as  shown  in  the  proof  of  Theorem  3. 


Q.E.D 
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We  consider  the  problem  of  maxi ml  zing  the  long-run  average  return  in  a  single 
server  queueing  reward  system  in  which  the  customer's  offer  of  a  Joint  distribution 
of  reward  and  service  time  required  to  earn  this  reward  Is  independent  of  the 
renrvwl  process  which  governs  customer  arrivals.  After  formulating  the  problem 
as  a  semi-Markov  decision  process,  we  characterize  the  form  of  an  optimal  policy. 
When  the  renewal  process  is  Poisson,  the  characterization  is  easily  stated: 
accept  a  customer  if  and  only  if  the  ratio  of  his  expected  reward  to  his  expected 
service  time  is  larger  than  g,  the  long-run  average  return.  When  the  arrival 
process  is  Poisson,  g  is  easily  found.  Next,  batch  arrivals  are  permitted, 
and  further  results  are  obtained. 


DD 


FORM 


,1473 


(  P  A- 


S'%  0 tC 1  -  007. C«0  1 


s<*<unn  (  :.<•  Min  ,i’i  >n 


iecurity  Classification 


key  won  D5 


Job  shop  models 

Queue log  reward  system 

Semi-Markov  decision  processes 


DD 


FORM 

I  NOV  •• 


1473  ( BACK ) 


(PAGE  2) 


